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INTRODUCTION 


SRI  International  (SRI)  is  pleased  to  present  this  final  report  under  Contract 
F30602-93-C-0071,  Machine  Learning  for  Military  Operations  Planning,  to  Rome  Laboratory 
(RL)  of  the  United  States  Air  Force  (USAF)  and  the  Defense  Advanced  Research  Projects  Agency 
(DARPA).  The  work  described  is  part  of  the  DARPA/RL  Planning  Initiative  (ARPI):  specifically, 
the  progress  made  during  this  contract  to  develop  intelligent  tools  for  acquiring,  refining,  and 
validating  knowledge  bases  for  a  military  operations  planning  system. 

As  with  most  large-scale  AI  projects,  one  of  the  most  significant  obstacles  to  developing 
intelligent  decision  support  systems  is  knowledge  engineering.  Large  quantities  of  domain 
knowledge  are  required  for  the  effective  reasoning  about  large-scale,  real-world  problems. 
Therefore,  tools  for  knowledge  acquisition  must  be  developed  before  fully  operational  crisis 
response  planning  systems  can  be  built.  We  emphasize  the  use  of  the  term  knowledge  acquisition, 
rather  than  knowledge  engineering,  to  distinguish  our  goal,  building  tools  that  take  an  active  role 
in  acquiring  knowledge,  from  the  more  traditional  approach  of  an  AI  expert,  creating  a  knowledge 
representation  and  encoding  the  knowledge  base  by  hand. 

It  is  our  view  that  knowledge  acquisition  tools  should  (1)  enable  human  planners  to  transfer 
their  expertise  to  the  system,  (2)  support  the  acquisition  of  knowledge  from  on-line  sources,  and 
(3)  integrate  information  from  the  range  of  available  sources,  including  human  experts,  simulators, 
on-line  databases,  training  exercises,  and  actual  crises.  To  meet  these  needs,  SRI  has  developed  the 
Knowledge  Acquisition  Toolkit  (KATY),  a  package  of  knowledge  acquisition  tools  for  the  System 
for  Interactive  Planning  and  Execution  (SlPE-2),  SRI’s  generative  planning  system.  KATY  includes 
three  knowledge  editing  tools  (the  Operator  Editor,  Object  Editor,  and  Predicate  Editor)  and  two 
machine  learning  tools  (the  Operator  Learner  and  the  Probabilistic  Autonomous  GOal-Directed 
Agent  [PAGODA],  an  inductive  learning  system). 

The  graphical  Operator  Editor  allows  users  to  develop  new  planning  operators  and  revise 
existing  operators.  This  tool  also  supports  editing  of  the  object  hierarchy  via  an  interface  from 
SlPE-2  to  SRI’s  Generic  Knowledge  Base  (GKB)  Editor.  The  Predicate  Editor  allows  user  to  view, 
add,  and  modify  the  predicates  that  define  the  world  state. 

The  Operator  Learner  uses  the  PAGODA  learning  model  [desJardins  1992]  to  acquire  planning 
operators  via  feedback  from  simulators  and  from  the  user’s  planning  processes.  The  user  first 
enters  partially  specified  operators  that  reflect  an  initial  rough  description  of  how  a  subgoal  may 
be  solved.  The  Operator  Learner  then  “fills  in  the  blanks,”  identifying  appropriate  preconditions 
and  temporal  constraints  on  the  application  of  the  operator.  Thus,  the  user  contributes  expertise, 
while  the  automated  learning  tool  performs  much  of  the  tedious  work  of  developing  preconditions 
and  identifying  precisely  when  it  is  appropriate  to  apply  a  particular  operator. 

PAGODA,  which  was  originally  developed  as  part  of  Dr.  Marie  desJardins’s  dissertation 
research  [desJardins  1992],  has  been  enhanced  under  this  contract.  Specifically,  SRI  made  a 
number  of  extensions  and  improvements  to  PAGODA  to  support  the  inductive  learning  required  for 
the  Operator  Learner,  and  developed  a  generic  interface  for  describing  input  features  and  training 
examples. 

*The  GKB  Editor  was  developed  by  SRI  under  Contract  F30602-94-C-0263  to  Rome  Laboratory,  Generic 
Knowledge  Base  Browser  and  Editor. 
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This  report  describes  the  progress  made  on  the  development  of  KATY  during  the  3-year 
contract.  Section  1  summarizes  our  accomplishments.  Section  2  defines  the  problems  we 
addressed  and  the  rationale  for  our  solutions.  KATY  and  its  components  are  described  in  Section  3. 
Section  4  describes  the  various  simulators  that  we  reviewed  in  developing  the  current 
demonstration  scenario.  Evaluation  metrics  are  discussed  in  Section  5.  Related  work  is  surveyed 
in  Section  6.  Section  7  contains  a  list  of  publications  written  under  this  contract.  In  Section  8  we 
present  our  conclusions  and  describe  future  work.  In  Section  9  we  list  referenced  documents. 


1  SUMMARY  OF  ACCOMPLISHMENTS 

Our  accomplishments  during  this  contract  are  listed  below  and  are  described  in  detail  in  the 
cited  sections. 

•  We  ported  PAGODA,  an  inductive  machine  learning  system  [desJardins  1992],  from 
ZetaLisp/Flavors*  into  Lucid  Common  Lisp/Common  Lisp  Object  System  (CLOS). 

The  system  now  runs  on  any  standard  Common  Lisp  and/or  CLOS  platform.  We  also 
rewrote  substantial  parts  of  the  system  to  increase  its  efficiency  and  generality 
(Subsection  3.2.5). 

•  We  implemented  an  Operator  Editor  (Subsection  3.1)  based  on  SRI’s  Act  Editor 
[Wilkins  et  al.  1994],  Many  of  our  extensions  of  the  Operator  Editor  have  been 
incorporated  into  the  Act  Editor. 

•  We  implemented  an  Operator  Learner  that  uses  qualitative  constraints  on  a  partial 
operator  to  create  a  series  of  experiments,  use  those  experiments  to  generate  plans, 
evaluate  the  plans,  extract  training  instances,  and  apply  inductive  methods  to  the 
instances  in  order  to  learn  preconditions  for  operator  application  (Subsection  3.2). 

We  extended  this  system  to  support  learning  from  the  user’s  choices,  by  creating 
training  examples  for  each  operator  choice  the  user  makes  during  planning 
(Subsection  3.2.5). 

•  We  developed  and  implemented  an  abstraction  language  for  qualitative  constraints 
that  allows  the  user  to  specify  what  information  is  relevant  to  the  success  of  the 
operator,  without  actually  writing  specific  preconditions  (Subsection  3.2.1). 

•  We  developed  a  generic  input  language  for  PAGODA  (Subsection  3.2.5).  This 
language  enables  PAGODA  to  accept  training  examples  in  a  range  of  formats,  and  can 
also  serve  as  a  generic  interface  from  the  Operator  Learner  to  other  inductive 
learning  systems. 

•  We  created  demonstration  scenarios  for  the  military  and  oil-spill  application 
domains,  and  explored  a  number  of  simulators  and  evaluation  tools  in  the  military 
domain  (Section  4). 

•  We  ported  SOCAP^  and  KATY  to  Allegro  Common  Lisp  and  CLIM  2.0. 


*AH  project  and  company  names  mentioned  in  this  document  are  the  trademarks  of  their  respective  holders. 
ISocap:  System  for  Operations  Crisis  Action  Planning  [Bienkowski  1995]. 
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•  We  created  a  World  Wide  Web  home  page  for  the  project 
(http://www.erg.sri.com/people/marie/papers/ml-summary.html). 

•  We  presented  project-related  talks  at  the  1994  IEEE  Conference  on  Tools  with  AI 
[desJardins  1994b],  the  1994  Fall  Symposium  on  Planning  and  Learning  [desJardins 
1994c],  the  1994  Fall  Symposium  on  Relevance  [desJardins  1994d],  the  1995  AAAI 
Fall  Symposium  on  Active  Learning,  Stanford  University,  Carnegie  Mellon 
University,  the  University  of  Massachusetts,  Rome  Laboratory,  and  several  ARPI 
workshops. 


2  MOTIVATION 

One  of  the  most  time-consuming  and  critical  tasks  in  the  development  of  crisis  response 
planning  systems  is  knowledge  engineering.  For  example,  a  significant  part  of  the  development 
effort  for  SOCAP  [Bienkowski  1995],  a  prototype  military  operations  planning  system  based  on  the 
AI  generative  planning  system,  SlPE-2,  consisted  of  writing  and  debugging  planning  operators. 
This  process  required  an  AI  expert;  it  would  have  been  difficult  to  teach  an  Al-naive  domain  expert 
how  to  write  and  debug  planning  operators,  using  the  few  tools  that  SlPE-2  provided  [Desimone  et 
al.  1993], 

Our  experience  suggested,  however,  that  given  the  appropriate  tools,  human  planners  who 
were  not  AI  experts  could  construct  knowledge  bases  for  AI  planning  systems.  These  tools  should 
guide  the  user  through  the  operator  development  and  debugging  process,  and  should  embody  the 
expertise  about  the  planning  representation  that  currently  must  be  provided  by  an  AI  knowledge 
engineer.  These  tools  should  also  support  both  the  construction  of  the  initial  knowledge  base  and 
the  updating  of  the  knowledge  over  the  life  of  the  planning  system.  Continuous  updating  of  the 
knowledge  base  is  essential  to  ensure  that  the  planning  system  is  not  constrained  by  a  pre-existing 
knowledge  base  and  can  be  used  for  unforeseen  types  of  operations  and  situations.  The  tools  will 
be  used  for  knowledge  acquisition  and  also  will  enable  the  system  to  learn — that  is,  to  improve  its 
performance  over  time. 


3  KATY 

Each  operator  in  SlPE-2  specifies  a  method  for  achieving  a  single  mission  or  task,  including 
the  actions  required,  preconditions  for  using  the  specified  method,  temporal  constraints  among  the 
actions,  and  expected  effects  of  the  actions.  Previously,  these  operators  had  to  be  developed 
manually  by  AI  planning  experts  who  edited  ASCII  descriptions  of  the  operators.  This  process  was 
tedious,  prone  to  errors,  and  difficult  for  users  who  were  not  AI  experts. 

KATY  is  a  package  of  knowledge  acquisition  tools  that  simplifies  this  process  and  reduces  the 
likelihood  of  errors.  If  extended  through  future  development,  these  tools  will  eventually  allow 
Al-naive  domain  experts  to  transfer  their  knowledge  directly  to  SlPE-2. 
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KATY  provides  two  types  of  tools:  knowledge  editors  and  knowledge  refiners.  Knowledge 
editors  provide  graphical  editing  functions  for  creating  and  modifying  planning  knowledge.  KATY 
includes  tools  for  editing  operators  (the  Operator  Editor),  class  hierarchies  (the  Object  Editor),  and 
knowledge  about  the  state  of  the  world  (the  Predicate  Editor).  Knowledge  refiners  automatically  or 
interactively  analyze  and  improve  the  knowledge  base.  KATY’s  Operator  Learner  is  a  knowledge 
refiner  that  learns  the  preconditions  for  applying  planning  operators  by  using  evaluation  feedback 
from  a  simulator,  an  automated  evaluation  tool,  or  the  domain  expert’s  planning  choices. 

Several  critical  issues  are  associated  with  knowledge  acquisition  tools  in  general,  and  tools 
based  on  automated  learning  in  particular: 

1 .  It  is  difficult  to  get  domain  experts  to  understand  and  use  new  tools. 

2.  Automated  learning  methods  must  rely  on  possibly  inaccurate  or  incomplete  data. 

3.  Training  data  may  be  difficult  and  expensive  to  collect,  and  training  instances  may 
be  quite  large  in  complex  domains. 

Consideration  of  these  issues  led  us  to  define  three  desiderata  for  KATY: 

1 .  Graphical  interfaces  and  semiautomated  verification  techniques  should  be  used  to 
simplify  user  training. 

2.  Data  from  a  variety  of  sources  including  multiple  simulators  and  evaluation  tools,  as 
well  as  user  behavior,  should  be  incorporated  into  KATY,  to  reduce  uncertainty  in  the 
acquired  knowledge. 

3.  Partial  knowledge  and  other  guidance  provided  by  the  user  should  be  used  to  reduce 
the  size  and  number  of  required  training  instances. 

In  the  first  year  of  this  effort,  we  implemented  the  Operator  Editor.  This  graphical  editing  tool 
for  planning  operators  enables  users  to  develop  new  operators  and  edit  existing  ones.  An  intelligent 
interface  guides  users  through  the  development  process,  ensuring  that  the  knowledge  is  in  the 
correct  form.  In  the  second  year,  we  implemented  the  Operator  Learner,  an  inductive  learning 
module  that  tests  and  refines  these  partial  operators.  In  the  third  and  final  year  of  the  project,  we 
extended  the  inductive  learning  system,  improved  the  experiment  generation  and  selection 
techniques  of  the  Operator  Learner,  and  extended  the  Operator  Learner  to  observe  and  learn  from 
expert  planning  behavior. 

3.1  OPERATOR  EDITOR 

The  Operator  Editor  provides  a  graphical  interface  (shown  in  Figure  1)  for  creating  and 
modifying  SlPE-2  planning  operators.  The  plot  of  the  operator  is  displayed  as  a  graph,  with  nodes 
that  correspond  to  actions  and  subgoals,  and  edges  that  indicate  temporal  relationships  among  these 
nodes.  The  other  information  associated  with  the  operator  can  be  displayed  as  buttons  or  text  fields 
(like  the  preconditions  in  Figure  1). 

The  Operator  Editor  has  four  significant  advantages  over  the  methods  previously  provided  in 
SlPE-2  for  developing  operators:  first,  the  operators  are  easier  to  understand  because  they  are 
displayed  and  edited  graphically,  rather  than  as  ASCII  text.  Second,  all  editing  is  done  via 
templates,  so  that  the  possibility  of  syntactic  errors  is  avoided.  Third,  a  data  dictionary  is 
maintained  to  ensure  that  all  classes,  variables,  and  predicates  entered  are  in  the  correct  format. 
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_ Operator _ 

(♦SUFFICIENT- PORT- CAPACITY  SEAPORT. 1  FORCE. 1) 
(*SUFriCIZWT-PORT-CAPACITY  SEAPORT. 1  S SEALIFT. 2) 
(♦SUFri  CHNT-PORT- CAPACITY  SEAPORT. 1  S SEALIFT.  2) 
(♦SUFFICIENT- CARRYING- CAPACITY'  S SEALIFT.  2  FORCE.  1) 
The  following  ie  not  true: 

(♦TRAVEL -DIRECT  FORCE. 1  LOCATION. 1  LOCATION. 2) 
Command: 

Variables  to  instantiate : 

None 

Command: 

Goal: 

rORCE.l  i*  located  in  LOCATION. 2. 

Command: 


DEPLOY-VIA-SEAPORT 


PURPOSE  |  |  PRECONDITIONS  |  |  SETTING 


RESOURCES 

PROPERTIES 

|  COMMENT 

EFFECTS  VARIABLES 


Preconditions: 

Alt  of  the  following  are  true: 

^SUFFICIENT- CARRYING- CAPACITY  SSEALIFT.1  FORCE.1) 
("SUFFICIENT- PORT- CAPACITY  SEAPORT.1  FORCE.1) 
("SUFFICIENT- PORT-CAPACITY  SEAPORT.1  SSEALIFT.2) 
("SUFFICIENT- PORT- CAPACITY  SEAPORT.1  SSEALIFT.2) 
(“SUFFICIENT-  CARRYING-CAPACITY  SSEALIFT.2  FORCE.1) 
The  following  is  not  true: 

(TRAVEL- DIRECT  FORCE.1  LOCATION/!  L0CAT10N.2) 


Figure  1.  Operator  Editor  Graphical  Display 


Finally,  developing  the  plot  as  a  graph  rather  than  as  a  text  description  of  the  branches  in  the  plot 
is  less  prone  to  error,  especially  since  the  Operator  Editor  automatically  maintains  the  parallel 
structure  of  the  graph. 

The  method  we  have  used  to  build  the  Operator  Editor  can  best  be  described  as  evolutionary 
development.  We  used  the  Operator  Editor  at  SRI  to  develop  operators  for  the  military  planning 
domain  and  for  a  United  States  Coast  Guard  (USCG)  project  in  which  SlPE-2  was  applied  to 
oil-spill  response  planning  [Desimone  and  Agosta  1993].  We  also  introduced  additional 
functionality  and  updated  the  user  interface,  based  on  feedback  from  the  ongoing  usage  of  the 
editor.  We  plan  to  use  the  editor  more  extensively  in  a  new  NRaD  -funded  project  to  apply  SlPE-2 
to  maritime  crisis  action  planning.  The  editor  will  thus  continue  to  evolve,  in  response  to 
requirements  generated  by  its  actual  users. 


*NRaD:  Naval  Command,  Control,  and  Ocean  Surveillance  Center  (NCCOSQ  Research,  Development,  Test,  and 
Evaluation  Division. 
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The  Operator  Editor  is  based  on  SRI’s  Act  Editor.  The  Act  Editor  uses  the  Act  representation 
for  procedural  knowledge,  which  subsumes  SlPE-2’s  operator  representation.  Because  the  Act 
representation  used  in  the  editor  is  different  from  SlPE-2’s  internal  representation,  operators  must 
be  translated  back  and  forth  between  the  two  representations.  (This  translation  is  done 
automatically  by  the  software.)  We  have  tailored  the  Operator  Editor  to  users  who  are  familiar  with 
SlPE-2’s  representation,  but  do  not  necessarily  understand  the  details  of  the  Act  representation.  For 
the  remainder  of  this  report  we  will  refer  to  “operators,”  although  in  the  editor  they  are  internally 
represented  as  Acts. 

The  Operator  Editor  can  be  used  during  planning  and  replanning,  as  well  as  during  knowledge 
development.  In  particular,  whenever  SlPE-2  fails  to  solve  a  problem,  the  user  has  the  option  of 
entering  the  Operator  Editor.  Any  operators  that  the  user  adds  or  modifies  during  the  Operator 
Editor  session  are  made  available  to  SlPE-2  when  the  editor  is  exited.  For  example,  if  SlPE-2  fails 
to  solve  a  goal  because  no  operator  has  a  purpose  that  matches  the  goal,  the  user  can  enter  the 
Operator  Editor,  build  an  operator  that  represents  a  subplan  for  solving  that  goal,  and  return  to 
SlPE-2;  the  new  operator  will  be  used  to  expand  the  goal  in  the  plan  without  any  need  for 
backtracking. 

The  Operator  Editor  provides  consistency  checking  and  type  checking  throughout  operator 
development.  The  goal  is  to  support  the  users  by  giving  them  as  much  assistance  as  possible, 
without  constraining  them  to  a  particular  model  of  operator  development. 

We  developed  an  abstraction  language  for  expressing  qualitative  constraints  (QCs)  (see 
Subsection  3.2.1)  and  used  this  language  to  extend  the  Operator  Editor  to  support  the  writing  of 
operators.  Thus,  users  can  develop  partial  operator  descriptions  that  can  be  fed  into  the  inductive 
learning  tool.  The  QCs  shown  in  Figure  2  are  examples  of  abstract  preconditions  that  must  be 
instantiated  by  the  machine  learning  system.  For  example,  the  second  QC  states  that  the  sea-state 
(roughness  of  the  water)  in  the  sea  sector  where  an  operation  is  being  performed 
(sea-sector .  1 ),  at  the  time  of  the  operation  (latest .  1),  is  relevant  to  the  success  of  the 
operation.  The  asterisk  (*)  indicates  that  the  sea-state  value  is  unknown  (i.e.,  the  range  of 
acceptable  values  must  be  determined  by  the  inductive  learning  system.)  We  will  return  to  this 
example  in  Subsection  3.2. 

All  of  the  fields  in  an  operator  are  displayed  graphically  on  the  screen.  Normally,  the  plot 
nodes  and  edges  are  drawn  as  a  graph,  and  the  other  fields  (purpose,  preconditions,  etc.)  appear  as 
buttons.  Left-clicking  on  any  button,  node,  or  edge  causes  a  description  of  the  contents  of  that 
object  to  be  printed  in  the  interaction  window.  Middle-clicking  on  an  object  copies  it  into  an  edit 
buffer,  which  can  then  be  pasted  into  another  operator.  Right-clicking  edits  the  object. 

We  have  developed  a  toolkit  of  CLIM-based  dialog  boxes  that  can  be  used  to  build  templates 
for  editing  a  variety  of  objects  (an  example  is  shown  in  Figure  3).  Each  field  in  a  dialog  box  has  an 
associated  type  that  defines  the  set  of  legal  completions,  which  the  user  can  access  as  a  menu  of 
choices.  If  the  user  enters  an  object  that  is  not  in  this  set,  the  system  either  signals  an  enror  or 
defines  a  new  object,  depending  on  the  context.  Each  domain’s  knowledge  base  has  an  associated 
data  dictionary,  which  is  built  automatically  by  the  Operator  Editor  and  contains  the  set  of  known 
predicates  (along  with  their  arities  and  argument  types),  classes,  and  objects.  This  data  dictionary 
is  used  to  generate  the  completion  sets  in  the  dialog  boxes. 

*The  phrases  “left-clicking  ”  “middle-clicking,”  and  “right-clicking”  mean  clicking  the  left,  middle,  and  right-hand 
mouse  buttons. 
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OPERATOR:  ml-get-booml 

ARGUMENTS:  vessell,  boom-level 1,  numericall,  sea-sectorl, 
sea-statel,  booml,  numerical2  is  (length-boom-ft  booml) ; 
INSTANTIATE:  booml; 

PURPOSE:  (level>=  vessell  boom-levell  numericall); 

PRECONDITION:  (in-service  booml); 

Properties : 

NONLOCAL -VARS  =  (sea-state. 1  sea-sector. 1  latest. 1)  , 

QC  =  ( (%property  max-sea-state  boom.l  *) 

(sea-state  sea-sector. 1  latest. 1  *)  )  ; 

PLOT: 

PARALLEL 

BRANCH  1: 

GOAL 

GOALS:  (located  booml  sea-sectorl); 

RESOURCES:  booml; 

PROCESS 

ACTION:  deploy-boom; 

ARGUMENTS:  booml,  sea-sectorl,  vessell,  boom-levell, 
numerical2 ; 

RESOURCES:  booml; 

EFFECTS:  (boom-deployed  booml  vessell), 

(produce  vessell  boom-levell  numerical2) ; 

BRANCH  2: 

GOAL 

GOALS:  (level>=  vessell  boom-levell  numericall); 
ARGUMENTS:  vessell,  boom-levell,  numericall,  sea-sectorl, 
sea-statel; 

END  PARALLEL 

END  PLOT  END  OPERATOR 


Figure  2.  ML-Get-Booml  Operator 

Dialog  boxes  are  used  during  the  execution  of  menu-  and  mouse-based  editing  commands,  to 
add  and  delete  nodes,  edges,  and  values  in  the  fields  of  the  operator  (e.g.,  preconditions  and 
resources).  Additionally,  the  user  can  reposition  nodes  in  the  graphical  display  and  can  toggle  the 
appearance  of  operator  fields  (which  can  be  viewed  as  buttons  or  as  text  boxes  in  the  graphical 
display). 

When  the  user  adds  or  deletes  an  edge,  the  graph  structure  required  by  SlFE-2  is  maintained 
automatically.  In  particular,  no  cycles  are  permitted,  and  whenever  a  node  has  two  successors, 
there  must  be  an  intervening  Split  node  and  a  corresponding  Join  node  at  the  ends  of  the  branches. 
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Figure  3.  Dialog  Box 


3.2  OPERATOR  LEARNER 

The  Operator  Learner  refines  the  partial  operators  created  by  the  user  in  the  Operator  Editor 
by  learning  the  preconditions  that  the  planning  system  uses  to  determine  when  the  operators  can 
be  applied. 

The  Operator  Learner  uses  an  operator’s  qualitative  constraints  (QCs):  partial  knowledge 
specified  by  the  user)  to  generate  a  series  of  experiments,  each  of  which  specifies  a  set  of 
constraints  on  the  planning  process.  By  varying  these  constraints,  the  Operator  Learner  tests  the 
quality  of  the  plans  produced  by  using  the  operator  under  a  range  of  conditions.  An  external 
module  evaluates  the  plan,  and  the  Operator  Learner  again  uses  the  QCs  to  extract  training 
instances  describing  the  world  state,  plan,  and  evaluation  result. 

PAGODA  analyzes  these  training  instances  to  create  a  hypothesis  of  the  conditions  under 
which  the  operator  is  expected  to  succeed.  The  result  is  interpreted  by  the  Operator  Learner  as  a 
precondition  for  the  operator,  and  is  incorporated  into  the  operator  definition,  subject  to  the  expert 
user’s  approval. 

The  system  learns  preconditions  via  feedback  from  a  simulator  or  by  observing  the  user’s 
planning  choices.  The  simulator  feedback  tells  the  system  what  actions  succeeded  and  how  long  it 
took  to  complete  these  actions.  What  will  be  learned  in  this  case  is  a  description  of  a  particular 
operator’s  success  in  the  simulator  (i.e.,  the  world  state  in  which  the  operator  succeeds);  therefore, 
the  learned  knowledge  is  only  as  accurate  and  complete  as  the  simulation.  However,  if  multiple 
simulators  are  used,  their  “domains  of  expertise”  could  be  combined  for  completeness. 
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The  user’s  planning  choices  are  indications  of  the  user’s  belief  that  a  particular  operator  will 
succeed.  If  the  user  chooses  operator  A  over  operator  B  in  a  particular  context,  it  is  assumed  that 
operator  A  is  “successful”  in  that  context  and  operator  B  “fails.”  Typically,  the  user  has  knowledge 
that  is  not  captured  in  the  system  (which  is  why  the  preferred  mode  of  the  planning  system  is 
interactive).  Learning  methods  based  on  observing  the  user  allow  us  to  make  explicit  some  of  this 
previously  unrepresented  knowledge.  This  process  and  some  problems  that  arise  in  its  application 
are  discussed  in  Subsection  3.2.5. 

The  architecture  of  the  Operator  Learner  is  shown  in  Figure  4.  The  Operator  Learner  is 
composed  of  four  major  processes  (Operator  Creation,  Operator  Refinement,  Experiment 
Generation,  and  Data  Generation),  each  with  one  or  more  subprocesses.  These  processes  are 
described  in  the  following  subsections. 

3.2.1  Operator  Creation 

The  Operator  Editor  is  used  to  create  one  or  more  operators  with  partial  preconditions, 
represented  as  QCs,  that  represent  user-provided  guidance  that  may  be  partial  or  incomplete.  The 
QCs  are  used  to  guide  the  learning  process.  By  expressing  QCs,  users  can  intuitively  specify 
abstract  constraints  on  the  operator  (e.g.,  by  specifying  relevant  properties  for  determining 
success),  even  when  they  cannot  precisely  and  completely  specify  the  actual  constraints.  Some 
examples  of  QCs  follow: 

•  “Bad  weather  usually  delays  transportation  actions,  and  air  movements  are  more 
likely  to  be  delayed  than  sea  movements.” 

•  “The  equipment  required  to  clean  up  an  oil  spill  depends  on  the  type  and  amount  of 
oil,  weather,  water  currents,  and  response  time.” 

•  “Republicans  are  less  likely  than  Democrats  to  vote  for  social  programs.” 

None  of  these  QCs  are  precise  enough  to  be  used  as  preconditions  by  the  planning  system,  but  they 
significantly  constrain  the  space  of  possible  preconditions,  making  the  automated  learning  of 
preconditions  via  refinement  of  the  QCs  computationally  feasible.  In  the  current  KATY 
implementation,  QCs  represent  generalizations  of  preconditions  (predicates  with  underspecified 
arguments).  The  system  fills  these  in  by  systematically  varying  the  values  of  the  arguments  (see 
Subsection  3.2.2). 

The  QCs  of  an  operator  are  specified  on  its  properties  slot,  and  can  be  created  and  modified 
in  the  Operator  Editor.  Each  QC  matches  SlPE-2  predicates  or  properties  of  objects  in  the  SlPE-2 
sort  hierarchy.  Arguments  to  a  QC  can  be  a  wild  card  (*)  that  matches  anything,  arguments 
(variables)  of  the  operator  that  contains  the  QC,  or  objects  or  classes  (the  latter  match  any  object 
of  that  class).  Negated  predicates  are  allowed. 

For  example,  the  QCs  of  the  operator  ml-get-booml,  shown  in  Figure  2,  are  (%property 
max-sea-state  boom.l*)  and  (sea-state  sea-sector.  1  latest .  1  *).  The  first  QC 
refers  to  the  max-sea-state  property  of  the  variable  boom .  1  (the  most  severe  level  of  ocean 
conditions  under  which  the  boom  is  effective).  The  user  has  indicated  that  the  max-sea-state 
can  take  on  any  value  with  the  asterisk  (* )  as  an  argument,  so  the  Operator  Learner  will  have  to 
use  its  learning  methods  to  identify  the  correct  value(s).  The  second  QC  indicates  that  the 
sea-state  predicate  (ocean  conditions)  for  sea- sector.  1  (the  location  of  the  operation)  at 
time  latest .  1  is  relevant.  Again,  *  indicates  that  the  value  has  not  been  constrained,  so  the 
system  must  learn  the  correct  value. 
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3.2.2  Operator  Refinement 

The  Operator  Refinement  process  provides  the  overall  control  for  the  Operator  Learner.  The 
Learning  Control  subprocess  decides  which  operator  to  learn  (currently,  this  decision  is  made  when 
the  user  explicitly  invokes  the  learning  system)  and  calls  the  Experiment  Generation  process.  The 
Inductive  Learning  subprocess  provides  a  generic  interface  to  an  external  inductive  learning 
module.  We  are  currently  using  PAGODA  (Subsection  3.2.5)  as  this  module.  However,  this 
interface  has  been  developed  in  a  general,  transparent  way  that  permits  the  use  of  other  inductive 
learning  software. 

After  inductive  learning  takes  place,  the  learning  system  must  decide  whether  to  continue 
generating  experiments,  or  whether  to  stop  learning.  Learning  should  stop  when  one  or  more 
acceptable  preconditions  based  on  the  QCs  have  been  learned,  or  when  none  could  be  found.  At 
present,  the  user  enters  this  decision  manually,  after  the  system  asks  the  user  whether  to  continue. 
If  more  learning  is  required,  another  experiment  is  selected  and  the  loop  continues.  If  no  more 
learning  is  to  be  done,  the  learned  preconditions  are  passed  back  to  the  Learning  Control  process. 
The  preconditions  are  then  presented  to  the  user,  who  indicates  whether  or  not  to  add  them  to  the 
operator  and  adjusts  the  operator’s  QCs.  Again,  this  process  is  currently  manual;  an  open  problem 
is  to  develop  methods  for  automating  the  process.  Finally,  the  updated  operators  are  stored  in  the 
domain  knowledge  base. 

3.2.3  Experiment  Generation 

Using  the  QCs  in  the  operator(s)  to  be  learned,  the  Generate  Experiments  subprocess 
generates  a  list  of  experiments.  Each  experiment  represents  a  set  of  constraints  to  be  applied  during 
the  planning  process,  and  consists  of  a  problem  to  solve,  operators  to  select,  variable  bindings  to 
apply  during  operator  expansion,  additional  world  predicates  to  establish,  and  new  objects  and 
properties  to  define.  Arguments  to  the  QC  that  are  filled  with  variables  are  either  predefined  (for 
nonlocal  variables  that  are  bound  before  this  operator  is  even  applied)  or  are  to  be  selected  by  the 
QC  (for  local  variables  whose  values  need  to  be  set  in  accordance  with  the  constraints  specified  in 
the  precondition).  The  two  types  of  variables  require  different  constraints  on  the  planning  process 
to  establish  their  values. 

The  other  arguments  (*  or  class  names)  are  values  that  provide  constraints  for  determining 
whether  or  not  to  select  this  operator,  and  for  instantiating  the  other  variables  within  the  operator. 
These  values,  however,  cannot  be  set  directly;  they  are  dependent  on  the  variable  choices  (e.g.,  in 
ml-get-booml ,  the  max-sea-state  value  *  depends  on  the  choice  of  boom.  1).  For 
example,  in  the  ml-get-booml  operator  in  Figure  2,  the  first  QC  indicates  that  the  resource 
boom.  1  should  be  selected,  in  part,  on  the  basis  of  its  max-sea-state  value.  That  is,  the  variable 
boom.  1  should  be  bound  to  a  boom  with  an  appropriate  value  for  max-sea-state ,  where  the 
meaning  of  appropriate  is  yet  to  be  defined.  Moreover,  selecting  a  boom  with  an  inappropriate 
value  for  max-sea-state  may  cause  the  operator  to  fail.  The  second  QC  indicates  that  the 
sea-state  at  the  place  and  time  of  the  operation  will  be  relevant  for  determining  whether  or  not 
to  select  this  operator.  If  the  sea-state  does  not  fall  within  an  appropriate  range  (again,  where 
the  precise  meaning  of  appropriate  must  be  learned),  the  operation  may  fail. 
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The  generated  experiments  should  therefore  vary  the  values  of  the  wild  cards  (corresponding 
in  this  example  to  the  max- sea- state  value  of  boom.  1  and  the  sea-state  in  sea-sector .  1 
at  time  latest .  1),  generate  plans  and  plan  evaluations  to  collect  data  about  the  success  of  the 
operation  for  different  values  of  these  wild  cards,  and  construct  hypotheses  about  the  values 
required  for  the  operator  to  succeed. 

To  select  the  possible  values  for  each  wild  card,  the  system  matches  each  QC  against  the 
initial  world  state.  From  the  set  of  all  possible  matches  (instantiations  of  the  QC),  the  system  then 
selects  a  representative  subset  by  associating  a  value  with  each  QC.  This  subset  consists  of  a  list  of 
bindings  for  the  wild  cards  in  the  QC.  (Wild  cards  include  the  *  argument  and  class  names,  but  not 
variable  names.)  One  instantiated  QC  is  selected  for  each  value  that  was  seen. 

For  example,  in  ml-get-booml ,  for  the  sea-state  QC,  the  matches  might  be 

(sea-state  sf-bay  1  1  ) 

(sea-state  richards-bay  1  2  ) 

(sea-state  drakes-bay  11). 

In  this  case,  the  values  are  1, 2,  and  1,  respectively,  and  the  first  and  second  matches  would  be 
selected.  For  each  of  these  matches,  one  or  more  constraints  are  identified  that  will  result  in  this 
instantiation  of  the  QC  in  a  generated  plan.  Local  variables  (those  whose  value  can  be  selected  at 
the  time  the  operator  is  applied)  have  variable  binding  constraints.  For  nonlocal  variables,  whose 
value  has  been  determined  in  a  previously  applied  operator,  the  process  is  more  complicated,  and 
requires  that  the  initial  world  state  be  modified  in  such  a  way  that  the  variable  will  be  bound  as 
desired.  For  example,  sea-sector .  1  in  ml-get-booml  is  bound  at  a  previous  planning  level  to 
a  location  that  is  defined  as  a  sensitive  area  in  the  planning  scenario.  We  do  not  have  a  general 
solution  for  computing  the  required  modifications,  so  the  user  or  system  designer  must  tell  the 
system  the  correct  predicate(s)  to  establish  for  each  QC. 

The  cross-product  of  experiments  for  each  QC  is  formed,  resulting  in  a  list  of  combined 
experiments.  This  is  currently  done  explicitly,  but  in  principle  could  be  done  dynamically  as 
individual  experiments  are  selected  for  each  QC.  The  Select  Experiment  subprocess  then  selects 
an  experiment  from  the  list,  by  asking  the  user  to  chose  one,  or  by  selecting  the  first  experiment  on 
the  list.  An  open  issue  that  remains  to  be  resolved  is  to  develop  methods  for  selecting  an  experiment 
automatically. 

3.2.4  Data  Generation 

The  Data  Generation  process  uses  the  selected  experiment  to  create  the  appropriate  input  data 
(training  examples)  for  the  Inductive  Learning  subprocess.  This  process  incorporates  three 
components,  Generate  Plan,  Evaluate  Plan,  and  Generate  Training  Example. 

3.2.4.1  Generate  Plan 

A  plan  is  generated  interactively  or  automatically,  using  the  constraints  represented  by  the 
selected  experiment.  The  constraints  that  can  be  specified  in  an  experiment  are  as  follows. 

•  Variable  Binding:  bind  a  given  variable  to  a  given  value  in  the  specified  operator. 

•  Operator  Selection:  always  select  a  specified  operator  when  it  is  applicable. 

•  Predicate  Assertion:  assert  one  or  more  predicates  in  the  initial  world  state. 

•  Object  Creation:  define  a  new  object  with  given  properties  before  planning. 
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The  implementation  of  these  constraints  uses  SD?E-2’s  built-in  hooks  for  variable  binding,  operator 
selection,  and  predicate/object  definition;  therefore,  we  wrote  relatively  little  code  to  implement 
this  module. 

If  other  plans  have  already  been  generated  using  similar  experiments,  a  limited  capability 
exists  to  reuse  those  plans.  Specifically,  the  system  can  define  new  objects  and  rebind  variable 
values  in  an  existing  plan.  Therefore,  if  a  new  experiment  differs  from  a  previous  one  only  in  the 
variable-binding  constraints,  the  system  can  reuse  the  plan,  rather  than  constructing  a  new  one  from 
scratch.  Additional  development  would  also  allow  the  system  to  use  SlPE-2,s  replanning 
capabilities  to  apply  different  operator-selection  and  predicate-assertion  constraints  to  previously 
generated  plans. 

3.2.4.2  Evaluate  Plan 

The  plan  is  sent  to  the  evaluation  module,  which  returns  an  evaluation  that  indicates  the 
success  or  failure  of  the  overall  plan  and  of  individual  actions  or  action  sequences  in  the  plan.  The 
definition  of  “success”  depends  on  the  domain  and  on  the  purpose  of  the  operator  being  learned 
(e.g.,  whether  a  unit  arrived  on  time,  or  oil  was  successfully  cleaned  up). 

Plan  evaluation  is  a  difficult  problem  in  general.  When  plan  evaluation  methods  are  used  in 
conjunction  with  other  tools  such  as  the  Operator  Learner,  the  inherent  difficulty  of  plan  evaluation 
is  exacerbated  by  the  need  for  the  evaluator  and  learner  to  share  a  semantic  representation  of  the 
plan.  In  the  rest  of  this  section,  we  discuss  the  problems  that  arose  in  this  area  during  this  project. 

One  factor  that  made  it  difficult  to  assess  the  success  of  an  operator  in  the  oil-spill  domain  is 
that  there  is  not  a  direct  mapping  between  the  purpose  of  a  given  operator  and  the  quantities 
returned  by  the  evaluation  model.  There  are  three  primary  reasons  for  this  lack  of  direct  mapping: 
first,  multiple  operators  are  often  applied  to  achieve  a  single  shared  goal,  so  credit  assignment  is  a 
problem.  For  example,  in  the  ml-get-booml  operator,  the  purpose  is  to  get  a  certain  quantity  of 
boom  to  the  location  of  the  operation;  this  can  be  done  by  combining  several  operators  that  each 
bring  a  smaller  quantity  of  boom  to  the  final  destination.  Therefore,  the  actions  in  a  given  operator 
may  succeed,  although  it  fails  to  achieve  its  overall  purpose. 

Second,  the  purpose  in  the  SlPE-2  sense  may  not  match  the  conceptual  purpose  of  the  operator. 
In  the  ml-get-booml  operator,  the  purpose  slot  lists  the  boom  level  as  the  operator’s  purpose,  but 
a  planning  expert  would  say  that  the  actual  purpose  of  this  operator  is  to  contain  a  certain  quantity 
of  oil.  Although  the  operator  clearly  would  fail  if  the  boom  did  not  arrive  in  the  given  location  by 
the  required  time,  the  operator  would  also  fail  if  the  boom  did  not  work  properly  in  the  prevailing 
ocean  conditions;  this  conclusion  is  not  explicitly  stated  in  the  purpose  slot. 

Finally,  even  if  “containing  the  oil  in  the  sector”  were  explicitly  stated  in  the  operator’s 
purpose,  that  goal  would  not  be  explicitly  represented  in  the  evaluation  model.  Rather,  the  output 
of  the  evaluation  model  specifies  the  quantity  of  oil  in  each  sector  at  each  point  in  time,  as  well  as 
the  quantities  that  have  washed  up  on  shore,  evaporated,  or  been  transported  out  of  the  sector 
(e.g.,  via  skimmer).  “Containing  the  oil”  thus  must  be  translated  into  a  quantitative  measure  of 
those  values,  which  raises  many  questions:  How  much  oil  must  be  contained  (and  by  what  time), 
for  the  operator  to  be  considered  successful?  When  oil  is  removed  by  skimmer  and  barge,  so  that 
no  oil  is  left  in  the  sector  and  no  oil  has  escaped,  should  the  operator  be  considered  to  have 
succeeded?  If  the  oil  sinks  before  it  is  contained,  so  that  it  cannot  be  cleaned  up  although  the  ocean 
surface  is  clear,  has  the  operator  succeeded? 
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In  the  military  transportation  planning  domain,  there  appears  to  be  a  more  obvious  mapping 
between  operator  purpose  (in  deployment  planning,  the  purpose  is  usually  to  ensure  that  a  given 
unit  arrives  at  a  location  by  a  specified  time)  and  simulator  results  for  at  least  a  subset  of 
deployment  operators  (although  this  is  not  always  true).  In  general,  we  believe  that  better 
ontologies  and  reasoning  methods  for  mapping  between  plans  (and  planning  knowledge)  and 
evaluation  criteria  are  needed,  for  these  applications  to  interact  effectively. 

3.2.4.3  Generate  Training  Example 

The  Operator  Learner  next  uses  the  qualitative  constraints  to  generate  a  training  example  for 
the  operator  being  learned,  using  the  new  plan  and  its  evaluation.  The  training  example  consists  of 
the  relevant  world  state  (i.e.,  any  instantiations  that  match  the  qualitative  constraints)  at  the  point 
in  the  plan  where  the  operator  was  applied,  and  a  positive  or  negative  label,  depending  on  whether 
the  operator  succeeded  or  failed. 

The  world  state  at  a  specified  node  is  generated  by  executing  the  plan  up  to  that  node.  (The 
effects  of  this  execution  are  undone  after  the  training  example  is  generated.)  The  qualitative 
constraints  in  the  operator  being  learned  determine  which  predicates  and  properties  are  extracted 
from  the  world  state. 

The  training  example  is  sent  to  the  inductive  learning  system.  This  procedure  currently  uses 
PAGODA,  but  is  implemented  as  a  “black  box”  module  that  could  be  replaced  by  a  different 
induction  method,  or  by  multiple  competing  methods. 

3.2.5  Learning  from  an  Expert  User 

In  addition  to  learning  from  simulators  and  automated  evaluation  tools,  the  Operator  Learner 
can  learn  from  an  expert’s  planning  choices.  Whenever  the  planner  chooses  an  operator  during 
planning,  a  set  of  training  examples  are  generated  that  describe  the  planning  context  (i.e.,  the  state 
of  the  world  at  the  node  where  the  operator  was  applied).  A  positive  example  is  generated  for  the 
chosen  operator,  indicating  that  the  operator  is  applicable  in  that  context.  Negative  examples  are 
generated  for  each  operator  that  is  not  selected,  indicating  that  the  operator  does  not  apply  (or  is 
less  applicable)  in  the  current  context.  For  example,  if  the  planner  frequently  picks  a  certain 
operator  for  actions  in  the  Middle  East,  and  a  different  operator  for  actions  in  the  Pacific  region,  a 
geographic  precondition  would  be  learned  for  each  operator. 

In  the  process  of  learning  from  the  user,  the  feedback  consists  of  training  examples  generated 
by  the  users  from  the  choices  they  make  during  planning.  The  user  may  guide  the  learning  process 
by  deliberately  setting  up  planning  problems  on  which  to  train  the  system,  or  may  simply  train  the 
system  “on  the  job”  by  allowing  it  to  observe  the  expert’s  planning  behavior  during  the  actual 
planning  process.  The  latter  method  is  a  particularly  useful  way  to  apply  machine  learning 
techniques  during  actual  planning,  since  knowledge  is  continuously  acquired  throughout  the  life  of 
the  system.  The  incremental  nature  of  PAGODA  is  an  advantage  for  this  type  of  learning,  because 
the  system  improves  its  behavior  incrementally  as  each  new  training  example  arrives. 

Each  training  example  consists  of  a  description  of  the  world  state  at  a  particular  point  in  the 
training  example,  and  an  operator  that  the  user  did  or  did  not  choose;  these  operators  correspond 
to  positive  and  negative  training  examples,  respectively.  These  examples  are  used  to  induce 
candidate  preconditions  for  the  operators.  The  same  inductive  learning  techniques  that  are  used  for 
the  feedback  from  the  simulator  are  applied  to  learn  these  preconditions. 
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These  examples  are  more  likely  to  be  “noisy”  or  incorrect  than  the  examples  received  from 
the  simulator,  for  the  following  reasons:  (1)  users  sometimes  make  arbitrary  or  incorrect  choices; 
(2)  false  negatives  occur,  since  more  than  one  operator  may  be  applicable  and  the  operators  that 
are  not  selected  are  classified  as  negative  examples;  (3)  the  user  may  understand  certain  aspects  of 
the  situation  that  are  not  reflected  in  SlPE-2’s  world  state;  (4)  users  sometimes  exhibit  superstitious 
behavior  (always  preferring  a  particular  operator  even  when  another  would  be  more  appropriate); 
and  (5)  planning  may  later  fail,  indicating  that  the  operator  choice  was  incorrect.  The  last  of  these 
conditions  can  sometimes  be  detected  by  identifying  cases  in  which  the  system  backtracks  and  an 
alternative  operator  is  applied  during  backtracking.  To  remedy  the  other  incorrect  examples, 
probabilistic  learning  methods  are  used,  and  learned  knowledge  is  always  confirmed  with  an  expert 
user  before  being  added  to  the  system. 

In  essence,  this  process  consists  of  acquiring  “hidden”  knowledge:  that  is,  knowledge  that  the 
user  has  but  that  may  not  be  explicitly  represented  in  the  system.  In  this  case,  nothing  in  the 
learning  system’s  representation  of  the  current  world  state  enables  it  to  distinguish  between  two 
situations  in  which  the  user  makes  different  decisions.  The  system  could  infer  that  there  should  be 
an  additional  predicate  to  represent  this  distinction,  and  would  then  add  the  predicate  to  its 
representation  for  future  use.  This  process  would  enable  us  not  only  to  refine  the  operators,  but  to 
improve  the  representation  of  the  domain. 

3.3  PAGODA 

PAGODA  is  a  model  for  an  intelligent  autonomous  agent  that  learns  and  plans  in  complex, 
nondeterministic  domains  [desJardins  1992].  The  guiding  principles  behind  PAGODA  include 
probabilistic  representation  of  knowledge,  Bayesian  evaluation  techniques,  and  limited  rationality 
as  a  normative  behavioral  goal. 

We  are  using  only  the  probabilistic  inductive  learning  component  of  PAGODA  for  this  project. 
The  inductive  hypotheses  are  represented  as  sets  of  conditional  probabilities  that  specify  the 
distribution  of  a  predicted  feature’s  value,  given  a  set  of  input  features.  A  probabilistic  inference 
mechanism  allows  PAGODA  to  make  predictions  about  the  value  of  the  output  feature  in  a  given 
world  state  by  combining  the  relevant  probabilities. 

Theories  are  generated  by  means  of  a  heuristic  search  process,  guided  by  the  training 
examples.  The  theories  are  evaluated  by  means  of  a  Bayesian  technique  that  provides  a  tradeoff 
between  the  accuracy  and  the  simplicity  of  learned  theories.  The  prior  probability  of  a  theory  is  a 
measure  of  its  simplicity — shorter  theories  are  more  probable. 

Since  SlPE-2  cannot  represent  the  probabilistic  theories  learned  by  PAGODA  as  preconditions, 
we  use  thresholding  to  create  deterministic  preconditions.  Information  is  lost  in  this  process:  in 
general,  the  deterministic  preconditions  are  overly  strict  (i.e.,  they  sometimes  rule  out  an  operator 
in  a  case  where  it  is,  in  fact,  applicable).  Each  rule  that  PAGODA  learns  states  that  in  situation  S,  an 
action  or  operator  A  succeeds  with  probability  P.  The  learning  system  analyzes  the  theories  (rule 
sets)  to  identify  situations  S  such  that  if  S  is  true,  A  succeeds  with  probability  greater  than  some 
threshold  P success’  if  S  is  false,  A  fails  with  probability  greater  than  another  threshold  Pfauure.  These 
situations  are  the  discrimination  conditions  for  A,  and  are  added  to  the  system  as  preconditions 
after  they  are  confirmed  by  the  user. 
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To  give  a  very  simple  example  of  how  inductive  learning  works,  suppose  that  in 
ml  -get  -booml ,  the  result  of  evaluating  the  plan  is  such  that  regardless  of  the  specific  sea-sector 
and  time,  whenever  the  value  of  the  sea-state  predicate  is  3  or  less,  the  operator  succeeds,  and 
whenever  it  is  4  or  more,  the  operator  fails.  The  learning  system  would  form  the  hypothesis 
(sea-state  sea-sector.  1  latest.  1  [1-3])  =>  success 
(sea-state  sea-sector.  1  latest.  1  [4-5])  =>  failure. 

If  there  is  noise  or  randomness  (e.g.,  in  some  cases  the  operator  fails,  even  though  sea-state  is  3  or 
less,  or  sometimes  succeeds  when  sea-state  is  4  or  5),  the  probabilistic  hypothesis  evaluation  model 
built  into  PAGODA  determines  the  most  probable  hypothesis.  The  hypothesis  evaluation  and 
inductive  learning  mechanisms  of  PAGODA  are  detailed  elsewhere  [ibid.] . 

3.4  PREDICATE  EDITOR 

The  Predicate  Editor  is  a  revised  and  generalized  version  of  what  was  originally  called  the 
Information  Window  in  SOCAP.  Previously,  this  display  was  tailored  to  a  particular  scenario, 
showing  specific  world  predicates  in  each  of  six  fixed  panes.  The  generalized  Information  Window 
allows  the  user  to  tailor  a  presentation  for  different  domains  and  for  different  views  of  a  given 
domain,  by  specifying  how  many  panes  appear,  and  which  predicates  are  displayed  in  each  pane. 

3.5  GKB  EDITOR 

Under  a  separate  contract,  SRI  has  developed  the  GKB  Editor  for  editing  class  and  object 
hierarchies  [Karp,  Myers,  and  Gruber  1995].  The  GKB  Editor  provides  a  graphical  interface  for 
browsing  and  modifying  classes  and  instances  and  their  properties.  The  underlying  representation 
is  the  Generic  Frame  Protocol  (GFP),  so  object  hierarchies  created  with  the  GKB  Editor  can  be 
shared  with  any  system  that  understands  (or  provides  an  interface  to)  the  GFP.  KATT  provides  an 
interface  to  this  editor  for  creating  and  modifying  the  SlPE-2  sort  hierarchy. 


4  SIMULATORS 

Our  original  proposal  was  to  develop  and  apply  knowledge  acquisition  tools  in  SOCAP’s 
military  transportation  application  domain.  Due  to  representational  inadequacies  of  the  available 
transportation  simulators,  and  the  integration  difficulties  their  use  presented,  we  decided  to  use  an 
oil-spill  domain  instead.  In  the  following  subsections,  we  discuss  the  simulators  and  evaluation 
tools  that  were  available  for  the  oil-spill  and  military  deployment  domains,  and  explain  why  we 
selected  this  oil-spill  domain. 

4.1  OIL-SPILL  DOMAIN 

The  Operator  Learner  demonstrations  used  an  oil-spill  planning  domain,  as  noted  above.  The 
Spill  Response  Configuration  System  (SRCS)  plans  responses  to  coastal  oil  spills  and  identifies 
equipment  shortfalls.  The  SRCS  incorporates  a  spreadsheet-based  evaluation  model  that  is  used  for 
the  feedback  required  by  the  Operator  Learner. 


16 


The  oil-spill  domain  is  a  good  analogue  to  the  military  transportation  planning  domain,  so  that 
our  development  work  for  the  oil-spill-based  demonstration  can  be  applied  directly  to  the  military 
domain.  Both  domains  are  crisis  response  planning  situations,  where  actions  to  respond  to  an 
emergency  must  be  identified,  along  with  the  resources  needed  to  perform  the  actions.  In  both 
cases,  methods  for  moving  the  requisite  equipment  to  the  crisis  site  must  also  be  identified. 

The  success  of  an  action  in  a  plan  in  the  oil-spill  domain  translates  to  the  percentage  of  oil 
cleaned  up  in  the  specified  sector.  In  the  demonstration  scenario,  positive  instances  for  the  learning 
system  are  those  actions  resulting  in  greater  than  P%  of  the  oil  being  cleaned  up  (where  P  is  a  fixed 
value  determined  by  a  domain  expert).  A  future  research  direction  would  be  to  explore  methods  of 
learning  the  degree  of  success  of  an  action. 

The  success  of  an  action  in  the  oil-spill  domain  has  a  direct  analogue  in  the  military 
transportation  planning  domain,  where  the  degree  of  success  is  determined  by  how  much  time  a 
unit  is  delayed  (i.e.,  whether  and  by  how  much  time  a  unit  arrives  late)  and  how  many  resources 
(transportation  assets,  fuel,  or  personnel)  are  used. 

We  identified  three  alternative  demonstration  scenarios,  in  addition  to  the  sea-state  example 
described  in  Subsection  3.2.  The  first  of  these  scenarios  was  used  for  the  final  project 
demonstration,  which  showed  the  system  learning  high-level  strategies  for  oil-spill  cleanup  by 
observing  the  user’s  choices.  The  three  scenarios  are  outlined  below. 

1 .  In  the  top-level  operators,  learn  the  conditions  under  which  each  type  of  operation  is 
effective.  This  learning  activity  might  involve  identifying  the  priority  of  each 
targeted  sea  sector,  based  on  the  expected  degree  of  effectiveness  of  the  plan;  or  it 
might  involve  simply  learning  preconditions  to  identify  the  best  single  sea  sector. 

The  factors  relevant  to  this  prediction  include 

-  The  proximity  of  cleanup  site  to  spill 

-  The  depth  of  water  (shallower  water  makes  cleanup  more  effective) 

-  The  location  of  protected  areas 

-  The  equipment  available 

-  Weather  conditions. 

2.  At  the  middle  level  of  planning,  learn  which  type  of  skimmer  works  best  under  which 
conditions.  The  relevant  factors  include 

-  The  sea  state  (a  number  summarizing  the  severity  of  weather,  which  is  in  turn 
determined  by  tides,  waves,  and  currents) 

-  The  oil  thickness  (available  from  the  trajectory  model) 

-  The  encounter  rate  (determined  by  sweep  area,  speed,  and  skim  rate,  which  are 
properties  of  the  boat  and  boom  used) 

-  The  recovery  rate  (i.e. ,  the  percentage  of  the  oil  encountered  that  is  successfully 
skimmed) 

-  The  efficiency 

-  The  pump  rate 

-  The  storage  capacity  (on  board,  as  well  as  in  bladders  or  tugs  available  for 
offloading) 

-  The  personnel  available 

-  The  time  of  day. 
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3.  At  the  lowest  level  of  planning,  learn  to  predict  the  length  of  boom  required  for  a 

given  situation.  Relevant  factors  include 

-  The  weather  and  ocean  conditions 

-  The  angle  of  boom  with  respect  to  current  (a  shallow  angle  is  more  effective  in 
deflecting,  but  requires  more  boom) 

-  The  ocean  current 

-  The  boom  height 

-  The  purpose  of  using  boom  (excluding  oil  from  a  protected  area,  versus 
deflecting  it  to  a  shore  or  water  area  where  it  will  be  cleaned  up) 

-  The  type  of  boom 

-  The  depth  of  water 

-  The  leakage  rate  (determined  by  the  above  factors) 

The  number  of  booms  (e.g.,  doubling  booms  reduces  leakage  but  requires  twice 
as  many  boom  feet). 

These  scenarios  show  the  wide  range  of  applicability  of  the  learning-based  methods  at  all 
planning  levels.  Inductive  learning  methods  enable  the  system  to  identify  the  factors  relevant  to  the 
success  of  each  planning  operator,  and  to  aggregate  these  factors  to  an  appropriate  level  for  each 
step  in  the  decision-making  process. 

4.2  MILITARY  DEPLOYMENT  DOMAIN 

In  the  deployment  planning  scenario  we  originally  proposed,  severe  weather  conditions 
would  cause  certain  types  of  ports  (e.g.,  those  with  unsheltered  harbors)  to  become  unavailable, 
and  certain  types  of  operations  (e.g.,  transport,  loading,  and  offloading)  to  take  longer  than  usual. 
In  addition,  decisions  about  the  allocation  of  resources  such  as  ports,  transportation  assets,  and 
personnel  would  be  made  by  matching  the  capabilities  of  the  resources  to  the  requirements 
imposed  by  the  plan.  We  decided  to  use  the  oil-spill  domain  instead  of  the  transportation  planning 
domain  or  another  military  planning  domain  (e.g.,  joint  operations  planning  or  air  campaign 
planning)  because  of  the  availability  and  suitability  of  a  plan  evaluator  for  the  oil-spill  domain.  In 
particular,  the  available  simulators  were  unable  to  represent  the  factors  that  would  impact  the 
planning  process,  as  listed  above,  and/or  the  cost  of  integration  was  too  high.  In  this  subsection,  we 
briefly  describe  the  simulators  that  were  available  for  military  planning  domains. 

4.2.1  PFE 

The  Prototype  Feasibility  Estimator  (PFE)  is  a  transportation  simulator  developed  by  Bolt 
Beranek  and  Newman  Inc.  (BBN).  It  uses  a  very  simple  model  of  port  capability,  and  does  not 
model  weather  or  other  aspects  of  the  situation.  To  be  usable  in  this  project,  PFE  would  have  to  be 
provided  with  a  time-phased  description  of  port  availability.  In  addition,  the  computation  of 
movement  times  would  have  to  be  modified  to  depend  on  a  wider  range  of  environment  features, 
such  as  the  current  weather. 

These  revisions  could  be  incorporated  by  using  the  time-phased  port  availability  information 
to  compute  PFE’s  input  (i.e.,  the  set  of  available  ports),  and  by  reimplementing  PFE’s 
time-computation  component,  which  would  require  substantial  development  effort.  In  addition, 
using  PFE  would  require  the  use  of  FMERG*  to  expand  the  major  force-level  plans  generated  by 
SOCAP  to  the  correct  level  for  running  the  simulator.  FMERG  has  not  been  maintained  since  the 
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Integrated  Feasibility  Demonstration  2  (IFD-2)  demonstration;  although  we  had  installed  it  and 
spent  some  time  working  on  its  integration  into  SOCAP,  we  realized  that  it  would  be  unrealistic  to 
rely  on  FMERG  for  this  project. 

4.2.2  TransSim 

We  acquired  the  TransSim  transportation  scheduler/simulator  from  the  University  of 
Massachusetts  (UM),  and  had  a  series  of  discussions  with  members  of  the  TransSim  development 
team  about  using  their  software.  TransSim  has  some  features  that  might  make  it  a  better  choice  than 
PFE  for  a  military  demonstration  scenario.  TransSim  takes  into  account  ship  speeds,  ship,  port,  and 
berth  availability,  and  weather  conditions;  all  of  these  are  represented  explicitly  and  are  easy  to 
vary  programmatically.  It  also  incorporates  UM’s  CLIP/CLASP  data  collection  package.  On  the 
other  hand,  TransSim  expects  Time-Phased  Force  Deployment  Data  (TPFDD)  as  input.  FMERG 
would  therefore  be  required  for  translation  from  the  major  force  level  of  SOCAP’s  plans  to  the 
TPFDD  level.  As  previously  explained,  the  use  of  FMERG  for  the  project  was  not  feasible. 

4.2.3  TACWAR 

We  examined  the  extensive  documentation  on  the  TACWAR  wargaming  system.  TACWAR 
is  used  for  wargaming  at  the  U.S.  Central  Command  and  at  other  locations;  it  was  developed  and 
maintained  by  the  Institute  for  Defense  Analysis  (IDA)  in  Washington,  D.C.  TACWAR  was 
promising  because  it  is  the  simulation  system  that  most  closely  matches  the  type  of  scenario  (joint 
military  operations)  that  was  encoded  by  the  operators  developed  for  SOCAP  as  part  of  IFD-2. 

We  concluded  that  TACWAR  would  be  suitable  for  use  as  a  simulator  to  execute  high-level 
plans  created  by  SOCAP.  SOCAP  could  be  used  to  determine  when  a  unit  arrives  and  where  it  is 
located,  as  well  as  its  mission  and  posture;  given  this  information,  TACWAR  would  be  run  in  batch 
mode  to  simulate  the  battle.  The  granularity  of  the  representations  of  units,  geography,  and  events 
used  by  TACWAR  appears  similar  to  that  used  by  SOCAP. 

However,  the  integration  effort  required  to  use  TACWAR  would  have  been  large.  It  would 
have  required  obtaining  a  copy  of  TACWAR  (which  was  previously  part  of  the  Common 
Prototyping  Environment  [CPE],  but  was  no  longer  supported  at  the  time  we  were  evaluating 
simulators  for  this  project);  obtaining  a  suitable,  unclassified  TACWAR  scenario  that  would 
obviate  the  creation  of  the  voluminous  input  files  required  by  TACWAR;  implementing  the 
scenario  in  SOCAP;  extracting  the  plan  from  SOCAP  in  a  form  suitable  for  TACWAR;  adding  new 
operators  to  correspond  to  TACWAR’s  missions  and  postures;  establishing  scenario-controlling 
parameters  that  are  independent  of  the  plan  generated  by  SOCAP;  and  extracting  the  results  of  the 
simulation  for  use  by  SOCAP.  We  concluded  that  the  TACWAR  integration  effort  was  beyond  the 
scope  of  this  contract. 

4.2.4  CTEM 

We  discussed  with  ISX  Corporation  the  possible  use  of  CTEM  in  this  project.  CTEM  is  used 
in  the  ACPT  for  air  campaign  plan  evaluation.  We  determined  that  CTEM  would  be  too  difficult 
to  acquire  (because  of  security  classification  problems)  and  would  not  provide  feedback  at  an 
appropriate  level  of  detail.  Also,  at  the  time  we  were  prepared  to  use  it  for  this  project,  the 
knowledge  acquisition  process  required  for  SlPE-2  to  work  in  the  ACP  domain  had  not  yet  been 
completed. 

*FMERG:  Force  Module  Enhancer  and  Requirements  Generator. 
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5  EVALUATION  METRICS 


We  were  unable  to  perform  a  formal  evaluation  of  KATY  under  this  contract,  due  to  time  and 
funding  limitations.  Our  experience  has  shown  the  knowledge  editing  tools  to  be  extremely  useful, 
substantially  reducing  development  time  and  the  likelihood  of  errors.  The  Operator  Learner 
performed  well  in  the  simple  demonstration  scenarios  we  developed,  and  we  expect  that  the 
learning  techniques  used  in  the  Operator  Editor  will  scale  up  well. 

We  identified  a  number  of  operationally  relevant  evaluation  metrics,  both  quantitative  and 
qualitative,  for  future  evaluations  of  KATY.  These  metrics  include 

•  The  quality  of  plans,  measured  in  terms  of  probability  of  success  (e.g.,  in  a 
simulator),  or  subjectively  by  an  expert  user 

•  The  time  required  for  a  user  to  create  new  planning  operators 

•  The  quantity  of  data  needed  for  acquiring  knowledge  (e.g.,  the  number  of  training 
examples  required  by  the  inductive  learning  system) 

•  The  computational  time  and  memory  required  to  run  simulations  and  the  learning 

system.  ~ 

In  addition  to  these  system  evaluation  metrics,  our  approach,  acquiring  planning  knowledge 
from  on-line  simulators  and  evaluation  tools,  raises  the  issues  of  plan  evaluation  and  measuring 
plan  quality.  These  metrics  are  required  by  any  system  whose  function  is  to  improve  planning 
performance,  since  is  impossible  to  improve  performance  without  some  measurement  of  that 
performance. 

Simulators  are  one  obvious  type  of  plan  evaluation  tool  for  the  domains  in  which  they  exist 
and  are  considered  to  be  reliable.  Many  military  domains  (including  transportation  planning)  use 
simulators  to  verify  plans  generated  by  humans,  so  it  seems  reasonable  that  a  learning  system 
should  consider  these  simulators  to  be  reliable  sources  of  knowledge. 

In  the  oil-spill  domain,  oil-trajectory  models  and  utility  analysis  of  oil-spill  damage  are 
widely  regarded  by  the  community  of  domain  experts  as  an  effective  tools  for  evaluating  response 
contingency  plans.  The  demonstration  we  have  developed  in  this  domain  uses  an  oil-trajectory 
model  and  a  spreadsheet-based  utility  analysis  of  the  generated  plan.  These  tools  enable  the  system 
to  gather  feedback  about  the  success  of  the  operators  being  learned,  and  enable  the  user  to  assess 
the  quality  of  the  plans  that  are  generated  by  the  system. 


6  RELATED  WORK 

The  problems  we  have  addressed  in  the  work  described  here,  and  the  methods  we  have  used 
to  solve  the  problems,  are  similar  to  problems  and  methods  described  in  recent  research  on 
experiment  generation,  knowledge  acquisition,  and  learning  apprentices.  In  addition,  several 
researchers  are  studying  ways  to  improve  the  performance  of  planning  systems  via  machine 
learning. 
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Gil  [1992]  describes  research  on  experiment  generation  for  knowledge  acquisition  in 
planners.  The  general  approach  she  uses  is  to  identify  missing  preconditions  by  observing  when 
actions  fail,  and  then  to  generate  experiments  to  determine  the  correct  precondition.  Some  of  the 
methods  described  by  Gil  are  applicable  to  the  problem  of  experiment  generation  in  KATY,  but 
many  problems  remain  to  be  solved  (see  Section  8). 

Much  of  the  research  in  the  knowledge  acquisition  community  has  focused  on  structuring  the 
global  knowledge  acquisition  process.  EXPECT  is  a  knowledge  acquisition  architecture  that 
dynamically  forms  expectations  about  the  knowledge  that  a  problem-solving  system  needs  to 
acquire,  and  then  uses  these  expectations  to  interactively  guide  the  user  through  the  knowledge 
acquisition  process  [Gil  and  Swartout  1994].  Davis  [1993]  describes  the  use  of  metalevel 
knowledge  in  TEIRESIAS,  an  expert  system  for  stock  market  investment  advising,  to  guide 
identify  new  rules  to  be  added  to  the  expert  system.  The  metalevel  knowledge  allows  the  system 
to  “know  what  it  knows,”  and  therefore  to  identify  and  repair  bugs  in  its  knowledge  base  (missing 
or  incorrect  knowledge).  Eshelman  et  al.  [1993]  describe  MOLE,  a  knowledge  acquisition  system 
for  heuristic  problem  solving.  MOLE  generates  an  initial  knowledge  base  interactively,  and  then 
detects  and  corrects  problems  by  identifying  “differentiating  knowledge”  that  distinguishes  among 
alternative  hypotheses.  Ginsberg,  Weiss,  and  Politakis  [1993]  have  developed  SEEK,  which 
performs  knowledge  base  refinement  by  using  a  case  base  to  generate  plausible  suggestions  for  rule 
refinement.  These  methods,  which  view  the  knowledge  base  as  a  whole,  complement  the  Operator 
Learner’s  approach  of  focusing  on  refining  individual  operators. 

Learning  apprentices  are  a  recent  development  in  knowledge  acquisition  tools.  Mitchell, 
Mahadevan,  and  Steinberg  [1993]  characterize  a  learning  apprentice  as  an  “interactive, 
knowledge-based  consultant”  that  observes  and  analyzes  the  problem-solving  behavior  of  users. 
One  advantage  of  a  learning  apprentice  is  that  it  is  running  continuously  as  the  system  is  used  by 
a  wide  range  of  users;  thus,  the  evolving  knowledge  base  reflects  a  broad  range  of  expertise.  These 
researchers  developed  the  LEAP  apprentice,  which  uses  explanation-based  learning  (EBL) 
techniques  to  explain  and  generalize  cases  (traces  of  the  user’s  problem-solving  behavior)  in  the 
domain  of  digital  circuits.  DISCIPLE  [Kodratoff  and  Tecuci  1993]  also  uses  EBL,  as  well  as 
similarity-based  learning,  to  acquire  problem-solving  knowledge  in  the  domain  of  design  for  the 
manufacturing  of  loudspeakers.  The  Operator  Learner  is  similar  to  a  learning  apprentice  in  its 
mode  of  learning  from  the  user,  but  uses  inductive  methods  rather  than  EBL,  allowing  the  system 
to  acquire  a  broader  range  of  new  knowledge  without  the  need  for  domain  theories.  Since  we  also 
learn  from  external  simulators,  there  is  less  burden  on  the  user  to  provide  a  complete  set  of  training 
examples  from  which  the  system  learns. 

Wang  and  Veloso  [1994]  have  developed  a  system  that  inductively  learns  planning  control 
knowledge.  Their  system  makes  some  simplifying  assumptions  (e.g.,  that  there  is  no  randomness, 
and  that  the  system  has  a  complete  domain  representation)  that  limit  the  applicability  of  their 
approach  to  complex,  real-world  domains.  KATY  permits  randomness,  and  allows  the  user  to  guide 
the  learning  process  using  QCs,  which  we  believe  to  be  critical  for  large-scale  domains. 

Calistri-Yeh  and  Segre  [1994]  describe  an  ARPI-sponsored  adaptive  learning  and  planning 
system  (ALPS).  The  primary  mechanism  for  learning  within  their  system  is  the  use  of  a  set  of 
speedup  learning  techniques  to  improve  planning  performance.  Their  Probabilistic  Theory 
Revision  mechanism  refines  incorrect  or  incomplete  domain  theories,  and  can  be  viewed  as  a  type 
of  learning  or  adaptation.  Veloso  and  Borrajo  [1994]  use  a  combination  of  bounded  explanation 
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and  inductive  generalization  to  learn  control  rules  for  planning  systems.  Learning  control  rules  is 
a  slightly  different  problem  than  that  addressed  by  KATY — the  former  focuses  on  improving  the 
efficiency  of  the  planning  process,  whereas  the  latter  is  concerned  with  its  correctness.  These  two 
methods  could  be  combined  in  order  to  improve  performance  along  both  dimensions 
simultaneously. 


7  PUBLICATIONS 

The  following  publications  were  written  during  the  contract.  These  publications  are  also 
included  in  the  list  of  references  in  Section  9. 

•  desJardins,  M.  1994a.  “Evaluation  of  Learning  Biases  using  Probabilistic  Domain 
Knowledge,”  in  Computational  Learning  Theory  and  Natural  Learning  Systems, 
Vol.  2,  eds.  S  J.  Hanson  et  al.,  the  MIT  Press,  Cambridge,  Massachusetts. 

•  desJardins,  M.  1994b.  “Knowledge  Acquisition  Tools  for  a  Military  Planning 
System,”  presented  at  the  1994  IEEE  Conference  on  Tools  with  AI,  New  Orleans, 
Louisiana  (November);  in  Proc.  1994  IEEE  Conference  on  Tools  with  AI,  Morgan 
Kaufmann  Publishers  Inc.,  San  Francisco,  California. 

•  desJardins,  M.  1994c.  “Knowledge  Development  Methods  for  Planning  Systems,” 
presented  at  the  AAAI  Fall  Symposium  on  Planning  and  Learning,  New  Orleans, 
Louisiana  (November);  in  Working  Notes  of  the  AAAI  Fall  Symposium  on  Planning 
and  Learning,  AAAI  Press,  Menlo  Park,  California. 

•  desJardins,  M.  1994d.  “The  Use  of  Relevance  to  Evaluate  Learning  Biases,” 
presented  at  the  AAAI  Fall  Symposium  on  Relevance,  New  Orleans,  Louisiana 
(November);  in  Working  Notes  of  the  AAAI  Fall  Symposium  on  Relevance,  AAAI 
Press,  Menlo  Park,  California. 

•  Gordon,  D.F.,  and  M.  desJardins.  1995.  “Evaluation  and  Selection  of  Biases  in 
Machine  Learning,”  Machine  Learning  20(1/2),  pp.  5-22  (July/August). 

•  desJardins,  M.  1995.  “Goal-Directed  Learning:  A  Decision-Theoretic  Model  for 
Deciding  What  to  Learn  Next,”  in  Goal-Driven  Learning,  eds.  A.  Ram  and 
D.B.  Leake,  the  MIT  Press  pp.  241-250,  Cambridge,  Massachusetts. 

•  desJardins,  M.  1996.  “Knowledge  Acquisition  Tools  for  Planning  Systems,”  in 
Advanced  Planning  Technology:  Technological  Achievements  of  the  ARPA/Rome 
Laboratory  Planning  Initiative,  ed.  A.  Tate,  AAAI  Press,  Menlo  Park,  California. 


8  CONCLUSIONS  AND  FUTURE  WORK 

Automated  and  semiautomated  tools  for  knowledge  acquisition  will  become  increasingly 
essential  as  large-scale  planning  systems  are  developed  and  deployed.  Our  research  in  this  area  has 
led  to  initial  prototypes  of  two  types  of  tools:  interactive  graphical  editors  for  developing  planning 
knowledge,  and  an  inductive  learning  system  that  uses  simulator  feedback  and  the  user’s  planning 
choices  to  refine  and  verify  partial  operators. 
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On  the  basis  of  ongoing  usage  and  evaluation  of  these  prototypes,  we  believe  that  both  the 
editor  and  the  learning  system  form  essential  parts  of  a  knowledge  developer’s  toolkit  for 
constructing  large-scale  planning  applications.  The  key  advantages  these  tools  provide,  which  will 
enable  the  development,  deployment,  and  ongoing  maintenance  of  realistic  planning  applications, 
are 

•  Template-based  editing  methods  for  constructing  individual  planning  operators, 
reducing  the  likelihood  of  errors  in  the  development  process,  and  reducing  the 
tedium  of  operator  construction 

•  A  framework  for  acquiring  knowledge  from  multiple  sources  (plan  evaluation 
modules,  simulators,  and  expert  planner  behavior),  guided  by  initial  approximations 
provided  by  a  knowledge  developer. 

These  tools  focus  on  the  development  and  refinement  of  individual  planning  operators.  We 
also  recognize  the  need  for  additional  tools  in  KATY,  particularly  those  that  guide  and  manage  the 
development  of  the  knowledge  base  as  a  whole.  Some  such  tools  are  being  developed  by  others  in 
the  planning  research  community;  for  example,  researchers  at  the  Jet  Propulsion  Laboratory  have 
developed  specialized  techniques  for  assessing  the  consistency  and  completeness  of  a  knowledge 
base  [Chien  1996];  and  ISI’s  EXPECT  project  provides  a  framework  for  managing  a  structured 
knowledge  acquisition  process  [Gil  and  Swartout  1994].  These  methods  could  be  generalized  and 
applied  within  the  SlPE-2  framework. 

Future  Work.  Many  of  the  problems  raised  during  this  work  have  not  been  addressed  in  depth 
by  the  machine  learning  research  community.  Most  of  the  current  research  focuses  on  algorithms 
and  methods  for  inductive  or  explanation-based  learning.  While  developing  good  inductive 
learning  methods  is  important  (and  we  list  some  specific  research  directions  in  that  area  below), 
this  work  also  points  to  the  need  for  supporting  technologies  that  will  enable  the  effective 
application  of  inductive  learning  methods.  These  supporting  technologies  include  representing  and 
reasoning  about  bias,  experiment  generation,  knowing  when  to  learn  and  when  to  stop  learning, 
and  evaluation  methods  for  complex  forms  of  learned  knowledge. 

The  qualitative  constraints  used  by  the  operator  learner  provide  a  way  for  the  knowledge 
developer  to  feed  partial  knowledge  into  the  system,  without  having  to  specify  all  of  the  details  of 
an  operator’s  preconditions.  In  machine  learning  terminology,  the  developer  is  imposing  a  bias  on 
the  learning  system.  Generalizing  the  representation  and  implementation  of  qualitative  constraints 
would  broaden  the  types  of  bias  that  could  be  introduced  into  this  process. 

The  experiment  generation  capabilities  that  we  developed  for  this  project  were  tailored  for  the 
small  learning  problems  we  examined.  In  large-scale  domains,  methods  will  be  needed  to  select  an 
appropriate  subset  of  experiments  to  guide  the  learning  process.  Very  little  work  has  been  done  in 
this  area;  the  techniques  developed  by  Gil  [1992]  provide  some  interesting  ideas  for  directions,  but 
are  limited  to  fairly  simple  domains. 

The  development  of  stopping  criteria  that  enable  the  system  to  know  when  to  stop  learning  is 
directly  related  to  the  problem  of  experiment  generation:  to  construct  an  efficient  sequence  of 
experiments,  one  must  know  when  enough  experimentation  has  been  performed  (or,  equally 
important,  when  experimentation  is  not  yielding  a  useful  answer,  and  other  approaches  should  be 
tried). 
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Evaluating  the  learned  knowledge  is  a  nontrivial  problem  in  general,  and  particularly  difficult 
in  a  planning  domain.  Inductive  learning  methods  are  usually  evaluated  against  a  test  set  of 
examples  that  are  drawn  from  the  same  sample  population  as  the  training  examples.  In  the  case  of 
planning  knowledge,  it  may  be  difficult  to  generate  a  test  set,  and,  more  importantly,  the  real 
effectiveness  of  a  planning  operator  can  be  determined  only  by  using  it  in  the  planning  process. 
Therefore,  evaluating  the  learning  process  corresponds  to  evaluating  generated  plans,  which  is  an 
open  problem  for  most  application  domains. 

Finally,  the  inductive  learning  methods  that  have  been  developed  by  the  machine  learning 
research  community  have  generally  focused  on  a  batch  learning  situation  (where  all  training 
examples  are  available  at  the  onset  of  learning),  predicting  discrete  classes  using  a  well-defined  set 
of  input  features  (e.g.,  predicting  a  disease  type  from  a  set  of  symptoms).  Our  observation  is  that 
for  planning  problems,  and  for  a  learning  context  where  data  is  expensive  to  collect  and  may  arrive 
continuously  over  the  lifetime  of  the  system,  different  methods  are  needed.  In  particular, 
incremental  learning  methods  that  continuously  revise  hypotheses  as  new  data  arrives  are 
necessary.  These  learning  methods  must  be  able  to  predict  numerical  values  (e.g.,  the  expected 
degree  of  success  of  an  operation,  time  to  complete  an  activity,  or  amount  of  a  resource  required), 
and  must  be  able  to  reason  at  a  meta-level  about  the  representation  they  use  (e.g.,  they  should  be 
able  to  recognize  when  the  domain  representation  should  be  extended,  as  when  no  good  theory  can 
be  learned  by  using  the  current  representation). 

We  believe  that  the  prototypes  we  have  developed  demonstrate  the  utility  and  necessity  of 
providing  tools  for  knowledge  acquisition  in  the  development  of  planning  applications.  The  tools 
in  their  current  form,  particularly  the  operator  editor,  already  provide  useful  functionality,  and  are 
being  used  to  support  knowledge  development  in  an  ongoing  process.  However,  this  work  has  also 
highlighted  the  need  for  additional  research  directions  in  machine  learning  and  knowledge 
acquisition. 
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